Automatic Morpheme Segmentation and Labeling in Universal Dependencies Resources
نویسندگان
چکیده
Newer incarnations of the Universal Dependencies (UD) resources feature rich morphological annotation on the wordtoken level as regards tense, mood, aspect, case, gender, and other grammatical information. This information, however, is not aligned to any part of the word forms in the data. In this work, we present an algorithm for inferring this latent alignment between morphosyntactic labels and substrings of word forms. We evaluate the method on three languages where we have manually labeled part of the Universal Dependencies data—Finnish, Swedish, and Spanish—and show that the method is robust enough to use for automatic discovery, segmentation, and labeling of allomorphs in the data sets. The model allows us to provide a more detailed morphosyntactic labeling and segmentation of the UD data.
منابع مشابه
Segmentation Granularity in Dependency Representations for Korean
Previous work on Korean language processing has proposed different basic segmentation units. This paper explores different possible dependency representations for Korean using different levels of segmentation granularity — that is, different schemes for morphological segmentation of tokens into syntactic words. We provide a new Universal Dependencies (UD)-like corpus based on different levels o...
متن کاملInduction of the Morphology of Natural Language: Unsupervised Morpheme Segmentation with Application to Automatic Speech Recognition
In order to develop computer applications that successfully process natural language data (text and speech), one needs good models of the vocabulary and grammar of as many languages as possible. According to standard linguistic theory, words consist of morphemes, which are the smallest individually meaningful elements in a language. Since an immense number of word forms can be constructed by co...
متن کاملCross-lingual Word Segmentation and Morpheme Segmentation as Sequence Labelling
This paper presents our segmentation system developed for the MLP 2017 shared tasks on cross-lingual word segmentation and morpheme segmentation. We model both word and morpheme segmentation as character-level sequence labelling tasks. The prevalent bidirectional recurrent neural network with conditional random fields as the output interface is adapted as the baseline system, which is further i...
متن کاملInteractive Labeling of Image Segmentation Hierarchies
We study the task of interactive semantic labeling of a given segmentation hierarchy and present a framework consisting of two parts: an automatic component, based on a Conditional Random Field whose dependencies are defined by the inclusion tree of the segmentation hierarchy, and a feedback-loop provided by a human user. Experiments on two data sets show higher classification rates for the pro...
متن کاملA multi-scale convolutional neural network for automatic cloud and cloud shadow detection from Gaofen-1 images
The reconstruction of the information contaminated by cloud and cloud shadow is an important step in pre-processing of high-resolution satellite images. The cloud and cloud shadow automatic segmentation could be the first step in the process of reconstructing the information contaminated by cloud and cloud shadow. This stage is a remarkable challenge due to the relatively inefficient performanc...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017